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ABSTRACT 

The need for improved measures is particularly acute 
in reading because, in spite of the magnitude of time and effort 
which continues to be invested in reading, there is no insurance that 
the outcome is indeed proportionate to the effort involved. How much 
of the educational system's performance relative to its own goals is 
measured by a standardized test is unknown. Yet this is the kind of 
information which must be available to a school system if it is to 
make sound decisions on the effectiveness of its programs. The 
longitudinal evaluation study using criterion-referenced measures of 
important reading-related skills which is briefly described in the 
report is seen as offering a new model for tests development which 
allows for some user involvement in the construction process, as well 
as contributing significantly to the solution of problems raised in 
the context of this report. (Aathor/RC) 
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Ihe national concern for greater productivity in education vMch has 
characterized .the 1970s has focussed attention on the need for developing not 
only better instructional methods but also better ineasures of the outocn>e of 
such instruction. Ihe need for iitproved measures is particularly acute in read in g 
because, in spite of the magnitude of time and effort vMch continues to be in- 
vested in reading, there is no assurance that the outcone is ixdeed proportion- 
ate to the effort involved. To the contrary, studies of school productivity 
frequently result in the policy iitplication that little can be done to improve 
program effectiveness (Jensen, 1969) . 

The alternative inplication that it may be the measures of outcoue 
erplqyed are inadequate to the task is rarely drawn. In recent years, however, 
Jencks (1972) and Bormuth (1970) have seriously questioned the relevance and 
utili+^' of norm-referenced measiires of achievonent typically used in studies of 
sdio . productivity. 

An inportant question with respect to standardized tests is their 
relevance to the (±)jectives of an educational system, or to the caiparison of 
programs and units within a system. Since tlie typical measure is referenced 
to relative corparisons among persons in the standardization sanple, it does 
not refer specifically to the educational system *s intended or possible 
achievement. Nor does a particular student's obtained score on such a test 
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ref lect the level of readurg he has attained relative to the levels and dentins 
of reading materials in his educational system. Ilbw much of the educational 
system's performance relative to its own goals is measiired by a s'candatdized 
test is unknowii. Yet this is the kind of infonration which inust be available 
to a school systam if it is to make sourxJ decisions on the effect: y(2ness of its 
programs. 

The appropriate version of a norm-referenced test administered annually 
to the same subjects VTill show v*iether a particular student has iitproved his 
standing relative to the rest of the group. It will not ^.o.-: in absolute terms 
the amount of grartJi v^ich has taken place in a single student or in the group 
as a whole. Still, test data might be used to yield sane estimate of absolute 
growth if the content of the tests tapped, the same type of sJdLlls at successive 
age levels. In his analysis of reading tests, however. Singer (1973) points 
out that they subsume different types of skills, and that the relationship 
between these types shifts with increasing age. Broadly speaking, in the 
first four grades techniques of decoding, effcieint oculanotor slcills, and 
other mechanical aspects of reading are arp'.iasized. Ihese skills, according 
to Singer, reach an effective level of rattery by the fcurth grade. Thereafter, 
verbal and reasoning processes , which never reach a ceiling but continue to 
develop indefinitely, assume a major role in reading performance. Singer sees 
norm-referenced tests as confounding these ti^/o basic types of skills on vMch 
the traditional distinction between "learning to read" and "reading to learn" 
has been based. His solution is to use norm-referenced tests in two ways in 
the first four grades. The first is the one typically used in schools v^ere 
the student takes the level of a teist appropriate for his grade or age and 
receives a percentile score. The second is to administer an equivalent form 
of the first-grade test each year to assess absolute grairth in the skills 



involved in '.'leaming to read." On the same measure used in these two ways, the 
same student may be seen over a period of four years to improve substantially 
toward mastery level on the leaming-to-read skills, but to inprove little 
relative to his peers on the ability to read for information. 

Singer's solution to the problon of effecti- measureroerit in reading raises 
a number of interesting issues. In the first place it seems to suggest that all, 
or at least most , of the iitportant decoding skills are assesseil on a first- 
grade test, although it seems more likely that many of these skills are not 
introduced until second grade or even later. If this is so, it cannot be true 
that most children reach corplete mastery of these skills by fourth grade, and 
indeed, research and experience suggests that it is not. A possible solution 
to this problem might be to administer the second- and third-grade tests re- 
peatedly, in addition to the first-grade test. Hcwever, such a solution raises 
a related question about the content of norm-referenced tests, especially vis- 
a-vis their relation to intelligence tests. On the decoding skills. Singer (1973b) 
points out, the correlation with intelligence decreases fron first to fourth 
grade, as more and more students approach mastery. To the extent that the 
correlation fails to reach zero, we must conclude that either sane of the 
subjects have not attained the level of mastery; assumed by Singer to be uni- 

Kt, HjV is alio fvxt«\funA3 

versal by fourth grade, or ^#18 proocinoo -©i verbal and reasoning factors, or 
possibly bot\. Under these conditions, the use of norm-referenced tests for 
diagnostic purposes in the first four grades, as suggested by Singer, would 
seem to have some attendant problaiis. 

After fourth grade, by contrast, the correlation with intelligence con- 
tinues to be high, especially when the reading task is difficult enough to 
challenge even the brighter students. Singer concludes . that beyond fourth 
grade reading tests are systematically biased tcward the kind of questions 
that appear on intelligence tests, and that this bias increases with grrade 
lev-el. Perhaps findings such as those of Colanan et al. (1966) , which show 



the effect of schoolii^ beconing less and less are related to this sytaratic 
bias. 

As a result of the high relationship between reading and intelligence tests, 
a student's previous knowledge ana experience tend to be a detenniidng factor in 
his performance on staixiardized reading tests. In fact, Slnon (1970) has shown 
that high levels of perf onnance "nay be attained vAien the testee receives only 
the questions, without the passages fran vMch they are derived. Clearly, under 
these corxiLtions, the test has ceased to be a test of reading catprehension. 

The concept ot infomation gain has recently been offered as an alternative 
which avoids the cx)ntaniir^tioP. of pretest knowledge in readir^ scores (Bonnuth, 
1971) . Information gain is fne amount of information an individual gains, as 
measured by asking questions before and after reading the selection. By adjust- 
ing the final score on the. basis of pretest knowledge, an estimate of hew well 
the passage has been understood and processed is obtainei. Even v^en this tech- 
nique is not aiplqyed; it becxines iitpjrtant, in the light of Siiton's sti'dy, to 
.jis^ure that the passages used to test reading caiprehension present information 
vviiich^essential if the testee is to answer the questions correctly. 

Anong experts in the reading field who have made conprehension their area 
for special study, agreement has yet to be reached or^-Thether it consists of a 
unitary- facftor, five, or any other number of fitors (Davis, 1971) . For the 
purposes of suimative evaluation, it may not be iit?»rtant. For diagnostic and 
instructional purposes, ha^'ever, analysis of conprehension into its coiponent 
skills seems essential Further research in cognitive psychology and psycho- 
linguistics may deitonstrate the iit?»rtance of other factors, resulting in 
modification an! refinanent of the concept. Meanwh-ile, cur definition must 
reflect current kna^ledge ard armchair theorizing. From the practical stand- 
point, it appears that most teachers agree as to which skills are iuportant 
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and the grade level at vMch they should be taught. Since norrr-referenced tests 
are not designed to assess these skills hi isolation/ they cannot be used for the 
ongoing assessment of the effectiveness of the school ^s program for teaching the 
essential skills. Criterion-referenced tests based on behaviorally stated objec- 
tives which describe the specific skills in lonambiguous terms appear much more 
suitable for the longitudinal monitoring of those skills which a school system 
deems to be an integral part of its reading program. 

Bie various considerations vMch have been exit lined above led to plans for 
the construction and testing of a new assessment model for the vise of cooperating 
school districts in New York State. In the first phase of the study , v^J.ch has 
been described extensively else^ere (O'Peilly, 1973) , a Bank of Reading Objec- 
tives (BPD) , consisting of sane 2000 objectives grouped into six areas (multi- 
sensory readiness y decoding , vocabulary, corprehension, location and study skills, 
ana reading in the content areas) was carpiled by a team of rea d i n g research and 
curriculum experts. To date, efforts have been concentrated in two of these areas, 

vocabulary and corprehension. Fran the objectives for vocabulary and ccnprehen- 

of the nine » 
sion, each^cooperating school districts selected those \^ich VTere most relevant 

to its reading program, determined the grade level or levels at \Mch each objec- 
tive should be taught and tested, and indicated the relative importance of each 
objective by designating the number of test items to be constructed. The con- 
plete test was designed to be administered in a period of 30 mirates. In order 
to allcw for the continuous monitoring which was a major goal of the project, 
five equivalent forms of each test were constructed., and administered in the 
pilot phase at intervals of two to three weeks between March and June, 1974. 
Stibjects \^;ere randomly assigned to the five forms in such a way that ultimately 
every student took all five forms of the tests. Initially, the stucty focussed 
on grades 4 tlirough 6, but since the range of achievonent within these tliree 
grades was considerably greater than three yeeirs, tests were constructed at 
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seven levels, corresponding roughly to grades 1 through 1$ each student being 
assigned by his teacher to the level at vMch he was achieving. 

A data-processing system was devised to ccnplenvent the testing program. 
Ihft feedback provided to the schools included group data on each item, each 
objective, and total scores at each level. In addition, every student received 
information on his cwn performance on the same measures. The total system, 
which is kncwn as Carprehensive Achievement Monitoring (CAM)* , has several, 
advantages: (1) It permits diagnostic evaluation of individual and group 
strengths and weaknesses on specific skills, thus enabling the teacher to 
deploy instruction time more effectively. (2) It allows for the continuous 
monitoring of every student on each skill, and is therefore a viseful tool in 
the individualization of instruction. (3) It facilitates flexible ad hoc 
grouping for the teciching of specific skills on vMch identified students need 
further tutoring. (4) As additional data are gathered, correlational studies 
should indicate empirically vdiich of the skills are indeed most iitportant to 
the criterion of reading coiprehenSion. 

Fran our e:^)eriences with this initial pilot project, several modifications 
of the system have evolved. (1) In the first place, we found that the distinc- 
tion between vocabulary and corprehension was not as clear-cut as it may ^pear 
on the. surface. In fact, it was difficult to assign seme of the skills to one 
or the other category. For this reason, the were ocmbined into a single test 
of cotprehension. (2) In order to provide for stJLll further input on the part 
of the cooperating teachers, a pool of items for each objective was constructed, 
frcm which they could select ^to construct their tests, within the limitations 
inposed by the research design. Construction of these items is still in process, 
• and the final product vail be in the form of a Test Develop:nent Notebook consist- 

»CAM was developed by National Evaluation Systems, Inc. of Boston and Palo Alto. 
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ing of 800 pages cmtaining 4000 iters referenced to the 150 objectives vMch 
were deeinad inost in?»rtant by consensus of the participating school districts. 
Since the original seven levels of the pilot phase have also been extended to 
20 ordinal levels with two levels per grade for grades 1 through 10, the note- 
book pernd\:s the construction by the school district of at least 100 test fonns, 
distributed in units of five forms at each of the 20 levels. (3) Selection of 
passages on vMch the test questions were based folloved three criteria: (a) 
Sanpling of passages was made so as to r^resent the universe of reading 
itaterials at each age level, including instructional and leisure reading inaterials; 
(b) care was taken to ensure that the questions were unbiased in terms of 
previous information; (c) passages were calibrated for level of difficulty 
using a canbination of the Dale-Chall readability formula and the Harris- 
Jaccbson word lists. The outcone of th^e modifications following the field- 
testing phase is expected to be a valid instrument vMch is both relevant to 
school evaluation needs and sensitive to differential gravtb in the various 
skills subsumed under the rubric of readi n g ccmpreticnsion. 

A question of major interest in this project concerns the relative sensi- 
tivity of norm-referenced and criterion-referenced tests as measures of the 
influence of contributing school factors. Preliminary answers to this ques- 
tion are based on a series of multiple regression equations using the norm- 
and criterion-referenced measures as criteria, rnd student, teacher, and 
process variables as predictors. When the newly modified measures are 
available, it is anticipated that school factors will contribute more strongly 
to perfoxinance on the criterion-referenced measures. A relatcxi concern is 
the extent to \>Mch the criterion-referenced tests are perceived by students 
and teachers as fairer, less threatening, and more informative than conven- . 
tional formal tests. 
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The longitudinal evaluation study using criterion-referenced measures of 
iitportant reading-related s>:ills vMch-has been briefly described here is seen 
as offering a new model for tests develofment vMch allows for scxne user involve- 
ment in the construction process, as well as contributing significantly to the 
solution of prdblems raised in the context of this report. 
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